An Iterative Method for Distributed Database Design
نویسندگان
چکیده
The development of a distributed database system requires effective solutions to many complex and interrelated design problems. The cost dependencies between query optimization and data allocation on distrihuled systems are well recognized but little understood. We investigate these dependencies by proposing and analyzing an iterative heuristic which provides an integrated solution lo the query optimization and data allocation problems, The optimization heuristic itcrates between finding minimum cost query slrategies and minimum cost data allocations until a local minimum for the combined problem is found. A search from convergence efficiently scans the optimization search space for lower cost solutions. Parametric studies within a simple query environment demonstrate nearoptimal performance for the iterative method when minimizing lolal time and response cost of queries. The iterative method provides clear improvements over alternative solution methods. The paper concludes with the practical implications of this research and its future directions. 1. Design Optimization in Distributed Database Systems Increasingly, organizations are interconnecting computers for cooperative processing, and utilizing distributed database systems to control access to their decentralized information resources. The development of a distributed database system requires effective solutions to many complex and interrelated design issues, including network topology, hardware allocation, data partitioning, data allocation, query optimization, data replication, concurrency control, reliability, and recovery [Ozsu and Valduriez 19911. In order to most effectively utilize distributed database systems, organizations need practical design methods which can integrate multiple design issues IO achieve efficient overall system performance. While much research has been conducted on individual distributed design problems, little progress has been made toward integrating these problems. Most Proceedings of the 17th International Conference on Very Large Data Bases of the individual design problems are NP-hard, so researchers have usually studied them in isolation to control complexity and tractability. While this approach has led to effective solutions to parts of the overall system design, the interdependencies between individual problems are still not well understood. Query optimization and data allocation are two important distributed systems design problems that are closely interrelated. Distributed query optimization depends on how the data are allocated, since processing schedules often include operations on different sites and data transmissions between them. On the other hand, the optimal method of allocating data depends on the processing strategies used for solving queries. Typically, researchers studying data allocation assume a fixed query optimization method to generate processing schedules; while researchers on query optimization assume a fixed data allocation. By assuming a solution to one problem and solving the other, researchers control the complexity of these two problems, but fail to integrate their solutions. Comprehensive surveys of state-of-the-art research on distributed query optimization (e.g., [Yu and Chang 1984, Hevner and Yao 19871) and distributed data allocation (e.g., [Dowdy and Foster 1982, Hevner and Rao 19881) exist. The majority of research in one area has assumed a given solution for the other. Only a few researchers have investigated the inherent dependencies of the two design problems. Early research by Loomis and Popek [Loomis and Popek 19761 provides guidelines for data replication and allocation based on optimizing query strategies. They point out that multiple copies of data should be placed on a network to maximize parallel processing within queries. In [Wah and Lien 19851, the authors analyze the interdependenties among data partitioning, data allocation, query optimization, concurrency control, and network design in local multi-access distributed systems. A broadcast protocol is proposed to promote sharing of information to support the integrated solution of these control problems. No specific integrated solution methods are detailed, however. Apers develops a distributed data allocation algorithm that utilizes actual query processing schedules [Apers 1982, Apers 19881. A virtual network is defined with each database relation assigned lo a different virtual site with no relations al query sites. 389 Barcelona, September, 1991 Distributed query optimization is performed to generate processing schedules. An optimal data allocation is found by merging virtual data sites into actual network sites to minimize intermediate, relation-torelation transmission costs and final, result-to-query site transmission costs. Thus, Apers’ method intcgrates the two problems during design by sequentially optimizing query strategies and then data allocation. During execution of the distributed system, query optimization will be performed based upon the determined data allocation. Apers proposes an extended ‘dynamic heuristic’ to achieve greater integration of the two problems during system design. Hnwcver, the approach becomes quickly intractable as problem size increases. Sacca and Wiederhold extend Apers’ approach for data allocation in systems of tightly clustered processors [Sacca and Wiederhold 19851. Their allocation model recognizes implicit dependencies from partitions of a data entity as well as dependencies based upon user access patterns. Storage constraints of sites are also considered, The allocation process iterates between query optimization and data allocation based on a pairwise combination of data partitions at single processors in the cluster. This approach is shown to be effective on tightly-coupled processors with small communications delay. An extension of the approach to general networks is not demonstrated. A methodology for distributed database design proposed by Mukkamala includes an iterative integration of complex design problems [Mukkamala et al. 19881. The methodology consists of a sequential application of algorithms to optimize relation partitioning, data allocation, query optimiir!tion, and load balancing. Design evaluation, guided by an expert system, indicates when further iterations are needed to meet design goals. An internal repeating loop is shown between the data allocation and query optimization algorithms. However, no specific details on the implementation of this iterative process are provided. In this paper, we present an iterative method for integrating realistic query optimization and data allocation methods in distributed database design. In section 2, we describe an iterative heuristic method and discuss its flexibility and power. State-of-the-art query optimization and data allocation algorithms can be ‘plugged’ directly into the heuristic. In section 3, we demonstrate the use of the iterative heuristic in a ‘simple query’ environment. Cost models are developed to demonstrate the application of the heuristic to minimize query total times and to minimize query response times. A simple example demonstrates how an ‘optimal’ data allocation solution can be significantly improved by integration with the query optimiTation problem. We have implemented selected query optimization and data allocation algorithms to perform experimental studies, as presented in section 4. Test results show the iterative method always outperforms alternative methods on total time and response time Proceedings of the 17th International Conference on Very Large Data Bases problems, yielding results that are very close to optimal. Finally, in section 5, we present conclusions and our future research directions. 2. An Iterative Heuristic Method for Distributed Database Design The combined distributed query optimization/data allocation problem has an immense search space for optimal solution. Both optimization problems individually have been proven NP-hard. Eswaran proves that simple models of distributed data allocation are NP-hard [Eswaran 19741, and distributed query optimi7Ation has been shown NP-hard even in restricted query environments [Gavish and Segev 19861. Thus, the combined problem is NP-hard since a non-NP solution for the combined problem would imply a nonNP solution for each of the subproblems. A more complete analysis of the search space for the combined problem is found in [Blankinship 19911. Inputs: NeWott Sites and Topology bat&w Relatlon8 Querlat and Query Frewendsr
منابع مشابه
Materialization of Redesigned Distributed Relational Databases 1 Kamalakar Karlapalem
The changes in the distributed database environment during its lifetime necessitate a redesign of the distributed databases so as to keep the performance of the applications/transactions from degrading. Till now most of the work in distributed database design dealt with developing better fragmentation and allocation algorithms. This paper introduces an hitherto unexplored step in distributed da...
متن کاملAn iterative method for forecasting most probable point of stochastic demand
The demand forecasting is essential for all production and non-production systems. However, nowadays there are only few researches on this area. Most of researches somehow benefited from simulation in the conditions of demand uncertainty. But this paper presents an iterative method to find most probable stochastic demand point with normally distributed and independent variables of n-dime...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملAn iterative method for amenable semigroup and infinite family of non expansive mappings in Hilbert spaces
begin{abstract} In this paper, we introduce an iterative method for amenable semigroup of non expansive mappings and infinite family of non expansive mappings in the frame work of Hilbert spaces. We prove the strong convergence of the proposed iterative algorithm to the unique solution of a variational inequality, which is the optimality condition for a minimization problem. The results present...
متن کاملOptimal integrated passive/active design of the suspension system using iteration on the Lyapunov equations
In this paper, an iterative technique is proposed to solve linear integrated active/passive design problems. The optimality of active and passive parts leads to the nonlinear algebraic Riccati equation due to the active parameters and some associated additional Lyapunov equations due to the passive parameters. Rather than the solution of the nonlinear algebraic Riccati equation, it is proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991